NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

History-Guided Video Diffusion

Song, Kiwhan; Chen, Boyuan; Simchowitz, Max; Du, Yilun; Tedrake, Russ; Sitzmann, Vincent (July 2025, 2025 Forty-Second International Conference on Machine Learning)

Classifier-free guidance (CFG) is a key technique for improving conditional generation in diffusion models, enabling more accurate control while enhancing sample quality. It is natural to extend this technique to video diffusion, which generates video conditioned on a variable number of context frames, collectively referred to as history. However, we find two key challenges to guiding with variable-length history: architectures that only support fixed-size conditioning, and the empirical observation that CFG-style history dropout performs poorly. To address this, we propose the Diffusion Forcing Transformer (DFoT), a video diffusion architecture and theoretically grounded training objective that jointly enable conditioning on a flexible number of history frames. We then introduce History Guidance, a family of guidance methods uniquely enabled by DFoT. We show that its simplest form, vanilla history guidance, already significantly improves video generation quality and temporal consistency. A more advanced method, history guidance across time and frequency further enhances motion dynamics, enables compositional generalization to out-of-distribution history, and can stably roll out extremely long videos.
more » « less
Free, publicly-accessible full text available July 17, 2026
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

Chen, Boyuan; Martí_Monsó, Diego; Du, Yilun; Simchowitz, Max; Tedrake, Russ; Sitzmann, Vincent (September 2024, Neural Information Processing Systems 2024)

This paper presents Diffusion Forcing, a new training paradigm where a diffusion model is trained to denoise a set of tokens with independent per-token noise levels. We apply Diffusion Forcing to sequence generative modeling by training a causal next-token prediction model to generate one or several future tokens without fully diffusing past ones. Our approach is shown to combine the strengths of next-token prediction models, such as variable-length generation, with the strengths of full-sequence diffusion models, such as the ability to guide sampling to desirable trajectories. Our method offers a range of additional capabilities, such as (1) rolling-out sequences of continuous tokens, such as video, with lengths past the training horizon, where baselines diverge and (2) new sampling and guiding schemes that uniquely profit from Diffusion Forcing's variable-horizon and causal architecture, and which lead to marked performance gains in decision-making and planning tasks. In addition to its empirical success, our method is proven to optimize a variational lower bound on the likelihoods of all subsequences of tokens drawn from the true joint distribution.
more » « less
Full Text Available
Lyapunov-stable Neural Control for State and Output Feedback: A Novel Formulation

Yang, Lujie; Dai, Hongkai; Shi, Zhouxing; Hsieh, Cho-Jui; Tedrake, Russ; Zhang, Huan (July 2024, International Conference on Machine Learning)

Full Text Available
Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots

Chi, Cheng; Xu, Zhenjia; Pan, Chuer; Cousineau, Eric; Burchfiel, Benjamin; Feng, Siyuan; Tedrake, Russ; Song, Shuran (March 2024, Robotics: Science and Systems)

Full Text Available
Can Direct Latent Model Learning Solve Linear Quadratic Gaussian Control?

Tian, Yi; Zhang, Kaiqing; Tedrake, Russ; Sra, Suvrit (January 2023, Conference on Learning for Dynamics and Control)

We study the task of learning state representations from potentially high-dimensional observations, with the goal of controlling an unknown partially observable system. We pursue a direct latent model learning approach, where a dynamic model in some latent state space is learned by predicting quantities directly related to planning (e.g., costs) without reconstructing the observations. In particular, we focus on an intuitive cost-driven state representation learning method for solving Linear Quadratic Gaussian (LQG) control, one of the most fundamental partially observable control problems. As our main results, we establish finite-sample guarantees of finding a near-optimal state representation function and a near-optimal controller using the directly learned latent model. To the best of our knowledge, despite various empirical successes, prior to this work it was unclear if such a cost-driven latent model learner enjoys finite-sample guarantees. Our work underscores the value of predicting multi-step costs, an idea that is key to our theory, and notably also an idea that is known to be empirically valuable for learning state representations.
more » « less
Full Text Available
FormulaZero: Distributionally Robust Online Adaptation via Offline Population Synthesis

Sinha, Aman and; Tedrake, Russ (January 2020, Proceedings of the 37th International Conference on Machine Learning)
null (Ed.)
Balancing performance and safety is crucial to deploying autonomous vehicles in multi-agent environments. In particular, autonomous racing is a domain that penalizes safe but conservative policies, highlighting the need for robust, adaptive strategies. Current approaches either make simplifying assumptions about other agents or lack robust mechanisms for online adaptation. This work makes algorithmic contributions to both challenges. First, to generate a realistic, diverse set of opponents, we develop a novel method for self-play based on replica-exchange Markov chain Monte Carlo. Second, we propose a distributionally robust bandit optimization procedure that adaptively adjusts risk aversion relative to uncertainty in beliefs about opponents’ behaviors. We rigorously quantify the tradeoffs in performance and robustness when approximating these computations in real-time motion-planning, and we demonstrate our methods experimentally on autonomous vehicles that achieve scaled speeds comparable to Formula One racecars.
more » « less
Full Text Available

Search for: All records